Exploring Asymmetric Clustering for Statistical Language Modeling

نویسندگان

  • Jianfeng Gao
  • Joshua Goodman
  • Guihong Cao
  • Hang Li
چکیده

The n-gram model is a stochastic model, which predicts the next word (predicted word) given the previous words (conditional words) in a word sequence. The cluster n-gram model is a variant of the n-gram model in which similar words are classified in the same cluster. It has been demonstrated that using different clusters for predicted and conditional words leads to cluster models that are superior to classical cluster models which use the same clusters for both words. This is the basis of the asymmetric cluster model (ACM) discussed in our study. In this paper, we first present a formal definition of the ACM. We then describe in detail the methodology of constructing the ACM. The effectiveness of the ACM is evaluated on a realistic application, namely Japanese Kana-Kanji conversion. Experimental results show substantial improvements of the ACM in comparison with classical cluster models and word n-gram models at the same model size. Our analysis shows that the high-performance of the ACM lies in the asymmetry of the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Efficiency of Dampers for Repair and Strengthening of Existing Buildings

In this paper, seismic behavior of the existing buildings equipped by friction dampers is studied. Seismic performance of6-story, 9-story and 12-story steel buildings with damper and without damper were studied. The finite element modeling technique (SAP2000 Software) is used for analysis. Time History analyzing was done to achieve this purpose. For nonlinear dynamic analysis, the responses of ...

متن کامل

Modeling Stock Market Volatility Using Univariate GARCH Models: Evidence from Bangladesh

This paper investigates the nature of volatility characteristics of stock returns in the Bangladesh stock markets employing daily all share price index return data of Dhaka Stock Exchange (DSE) and Chittagong Stock Exchange (CSE) from 02 January 1993 to 27 January 2013 and 01 January 2004 to 20 August 2015 respectively.  Furthermore, the study explores the adequate volatility model for the stoc...

متن کامل

Exploring EFL Teachers’ Self-Efficacy, Reflective Thinking, and Job Satisfaction: Structural Equation Modeling

The increasing call for learning English as a foreign language has dramatically heightened the necessity to recruit effective English teachers. This is mainly because teachers have a key role in the success or otherwise of an educational program. Nevertheless, a comprehensive review of the related literature confirms the paucity of research studies on teacher characteristics which can influence...

متن کامل

Development and Validation of an Instrument Exploring Factors Challenging Iranian Graduate Student-Teachers

Improvement in the quality of teacher education programs, especially in higher education, is an important issue. Failure to have an efficient teacher education program could lead to the training of graduates who are not prepared for the realities of the classroom. Accordingly, in an attempt to help improve the present situa- tion of teacher education programs especially at the graduate le...

متن کامل

Unsupervised Clustering and Automatic Language Model Generation for ASR

The goal of an automatic speech recognition system is to enable the computer in understanding human speech and act accordingly. In order to realize this goal, language modeling plays an important role. It works as a knowledge source through mimicking human comprehension mechanism in understanding the language. Among many other approaches, statistical language modeling technique is widely used i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002